###### Plot Original Data using Plotly ######plot_ly(df_teu, x =~as.Date(Date), y =~TEUs, type ='scatter', mode ='lines',line =list(color ='#ffa988')) %>%layout(title ="Orignial Time Series of Port of Los Angeles Container Throughput (TEUs) ",xaxis =list(title ="Date"),yaxis =list(title ="TEUs"))
The time series of container throughput (TEUs) at the Port of Los Angeles from 2000 to 2024 shows an overall upward trend with pronounced cyclical fluctuations, reflecting both long-term trade growth and sensitivity to global shocks. Throughput expanded steadily in the early 2000s before declining sharply during the 2008 financial crisis, after which it recovered and stabilized within a broad seasonal cycle. Beginning in 2020, the series exhibits extreme volatility: a sharp contraction during the onset of the COVID-19 pandemic was followed by record peaks in 2021 as consumer demand surged and supply chain congestion intensified. In recent years, volumes have moderated but remain elevated, underscoring both the vulnerability of the port to external disruptions and its resilience as a central hub in U.S.–China trade and global supply chains.
Code
ts_df <-ts(df_teu$TEUs, start =decimal_date(as.Date(min(df$Date))), frequency =12) # Assuming monthly datats_log <-ts(df_teu$log_TEUs, start =decimal_date(as.Date(min(df$Date))), frequency =12) # Assuming monthly dataplot(ts_df, main ="Time Series plot for orignial data")
Code
plot(ts_log, main ="Time Series plot for log data")
The log transformation does not yield a significant improvement in residual behavior, while complicating forecast back-transformation. Therefore, we retain the original scale of the data for analysis and forecasting.
ACF with original data
Code
# ACF PlotggAcf(ts_log) +ggtitle("ACF Plot of Port of Los Angeles Container Throughput") +theme_minimal()
ADF test
Code
################ ADF Test #############tseries::adf.test(ts_log)
Augmented Dickey-Fuller Test
data: ts_log
Dickey-Fuller = -6.0505, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
The ACF plot of the original series shows a slow decay in autocorrelations across multiple lags, indicating strong persistence and nonstationarity. In addition, the AD F test yields a p-value of 0.01, which is below the 0.05 significance threshold, rejecting the null hypothesis of a unit root.
Taken together, these results confirm that the raw TEUs series of the Port of Los Angeles is non-stationary.
Code
diff_1 <-diff(ts_df)diff_2 <-diff(ts_df,differences =2)# ACF Plots for seasonal without normal differenced Dataacf_plot1 <-ggAcf(diff_1, 40) +labs(title ="ACF of first differenced TEUs") acf_plot2 <-ggAcf(diff_2, 40) +labs(title ="ACF of second differenced TEUs") # Combine ACF Plots using patchworkacf_plot1 / acf_plot2
The ACF plot of the first-differenced series shows that autocorrelations drop sharply after lag 1 and remain largely within the confidence bounds, indicating that first differencing has effectively removed the nonstationarity of the original TEUs series. In contrast, the second-differenced series does not provide additional improvement and may risk overdifferencing, which can distort the underlying structure of the data.
Therefore, a first-order difference is sufficient。
Code
diff_s <-diff(diff_1,12)# ACF Plots for Differenced Dataacf_plot1 <-ggAcf(diff_1, 40) +labs(title ="ACF of first differenced TEUs") pacf_plot1 <-ggPacf(diff_1, 40) +labs(title ="PACF of first differenced TEUs") acf_plots <-ggAcf(diff_s, 40) +labs(title ="ACF of first differenced seasonal TEUs") pacf_plots <-ggPacf(diff_s, 40) +labs(title ="PACF of first differenced seasonal TEUs") # Combine ACF Plots using patchworkacf_plot1 / pacf_plot1
Code
acf_plots / pacf_plots
q=0:1,Q=1:3,p=1:4,P=1,d=1,D=1
Code
SARIMA.c <-function(p1, p2, q1, q2, P1, P2, Q1, Q2, data) { d <-1; D <-1if (!is.ts(data)) data <-ts(data, frequency =12) s <-frequency(data); if (is.null(s) ||is.na(s) || s <=1) s <-12 rows <-list() i <-1for (p in p1:p2) {for (q in q1:q2) {for (P in P1:P2) {for (Q in Q1:Q2) {if (p + d + q + P + D + Q <=9) { model <-tryCatch(Arima( data,order =c(p, d, q),seasonal =list(order =c(P, D, Q), period = s) ),error =function(e) NULL )if (!is.null(model)) { rows[[i]] <-c(p, d, q, P, D, Q, model$aic, model$bic, model$aicc) i <- i +1 } } } } } }if (length(rows) ==0) { out <-setNames(as.data.frame(matrix(numeric(0), nrow =0, ncol =9)),c("p","d","q","P","D","Q","AIC","BIC","AICc") )return(out) } out <-as.data.frame(do.call(rbind, rows))names(out) <-c("p","d","q","P","D","Q","AIC","BIC","AICc")for (nm innames(out)) out[[nm]] <-as.numeric(out[[nm]])rownames(out) <-NULL out}
SARIMA(0,1,4)(0,0,2)[12] is best model for auto search.
Code
# Set seed for reproducibilityset.seed(345)###### Fit SARIMA(1,1,1)(1,1,1)[12] Model ######model_1_output <-capture.output(sarima(ts_df, 4, 1, 1, 0, 1, 1, 12))
Among the three candidate models, SARIMA(1,1,1)(0,1,1)[12] is selected as the most appropriate.
Although the (4,1,1)(0,1,1)[12] specification yields a slightly lower AIC, several of its AR coefficients are statistically insignificant, suggesting potential overfitting and reduced interpretability. Similarly, the (0,1,4)(0,0,2)[12] model captures short-term dynamics but fails to incorporate seasonal differencing (D=1), which is inconsistent with the pronounced seasonality of the TEUs series. In contrast, the (1,1,1)(0,1,1)[12] model achieves a favorable balance, with competitive AIC/BIC values, statistically significant parameters, and a parsimonious structure that adequately reflects both the underlying trend and seasonal dependence. Therefore, it represents the most robust and reliable choice among the three models.
Code
###### Fit ARIMA(1,0,2)(0,1,1)[12] Model ######fit <-Arima(ts_df, order =c(1, 1, 1), seasonal =list( order=c(0, 1, 1), period=12))###### Display Model Summary ######summary(fit)
Series: ts_df
ARIMA(1,1,1)(0,1,1)[12]
Coefficients:
ar1 ma1 sma1
0.0417 -0.7492 -0.9615
s.e. 0.0864 0.0622 0.0784
sigma^2 = 7.009e+09: log likelihood = -3673.92
AIC=7355.85 AICc=7355.99 BIC=7370.48
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -2360.452 81454.1 47476.86 -315.7311 321.5409 0.5693415
ACF1
Training set -0.005805141
Code
# Forecast the next 365 periodsforecast_result <-forecast(fit, h =36)# Display forecast accuracyaccuracy(forecast_result)
ME RMSE MAE MPE MAPE MASE
Training set -2360.452 81454.1 47476.86 -315.7311 321.5409 0.5693415
ACF1
Training set -0.005805141
This plot presents the SARIMA(1,1,1)(0,1,1)[12] forecast for the next three years. The model suggests that container throughput at the Port of Los Angeles will remain at historically high levels, with clear seasonal fluctuations persisting. The mean forecast indicates relative stability without a strong upward or downward trend, while the widening confidence intervals highlight increasing uncertainty over time. Port activity is highly sensitive to global trade conditions, capacity constraints, and potential external shocks.
Code
###### Forecast for 36 Periods (3 Years) ######forecast_horizon <-36# 36 months (3 years)###### Plot Forecasts Using Mean, Naïve, Drift Methods, and ARIMA Fit ######autoplot(ts_df) +autolayer(meanf(ts_df, h = forecast_horizon), series ="Mean", PI =FALSE) +autolayer(snaive(ts_df, h = forecast_horizon), series ="Naïve", PI =FALSE) +autolayer(rwf(ts_df, drift =TRUE, h = forecast_horizon), series ="Drift", PI =FALSE) +autolayer(forecast(fit, h = forecast_horizon), series ="ARIMA Fit", PI =FALSE) +ggtitle("TEUs Forecast (Next 3 Year)") +xlab("Year") +ylab("TEUs") +guides(colour =guide_legend(title ="Forecast Methods")) +theme_minimal()
This plot compares SARIMA with alternative benchmark methods, including Drift, Mean, and Naïve forecasts. The Naïve model simply extends the last observation forward, failing to capture the seasonal structure of the series. The Mean method produces an overly smoothed, nearly constant forecast that lacks interpretive value for a dynamic system. The Drift model incorporates a trend component but similarly neglects the strong seasonal patterns present in the data. In contrast, the SARIMA model effectively captures both long-term dynamics and recurring seasonal cycles, producing forecasts that are more consistent with historical patterns.
As such, SARIMA provides a more realistic and reliable basis for understanding future container throughput compared to simpler benchmark methods.
Code
# Load required librarieslibrary(tsibble)library(prophet)prophet_data <- df_teu%>%select(Date, TEUs) %>%rename(ds = Date, y = TEUs ) # Rename for Prophet###### Fit the Prophet Model ######prophet_model <-prophet(prophet_data)###### Extend the Forecast Horizon by 120 Months (10 Years) ######future_dates <-make_future_dataframe(prophet_model, periods =36, freq ="month")###### Generate Forecasts ######forecast_data <-predict(prophet_model, future_dates)###### Plot Forecast ######plot(prophet_model, forecast_data) +ggtitle("Forecast TEUs (Next 3 Years)") +xlab("Date") +ylab("TEUs")
This forecast model differs from the SARIMA approach in several key ways. First, SARIMA explicitly incorporates seasonal differencing and seasonal parameters, which allows it to capture recurring 12-month cycles more precisely. In contrast, the model shown here reflects seasonal fluctuations indirectly through its fitted trend and widening confidence intervals, rather than parameterizing seasonality directly. Second, the SARIMA forecasts are typically smoother and preserve a clear seasonal pattern over time, whereas this model also reproduces seasonal swings but with wider uncertainty bands, indicating greater caution about long-term stability. Finally, while SARIMA is particularly well-suited for datasets with strong and stable seasonal patterns, the current model is more flexible in balancing trend and seasonal dynamics, though it may be less robust for long-horizon forecasts.
Code
###### Forecast Next 12 Months Using SARIMA(1,0,2)(0,1,1)[12] ######sarima.for(ts_df, 12, 1,1,1,0,1,1,12)
$pred
Jan Feb Mar Apr May Jun Jul Aug
2020 896182.2 804110.1 846735.9 883995.5 921785.0 909357.3 946982.0 977222.2
Sep Oct Nov Dec
2020 943340.7 951048.3 883549.4 899035.3
$se
Jan Feb Mar Apr May Jun Jul
2020 83814.42 87323.12 90060.13 92691.96 95250.12 97741.31 100170.56
Aug Sep Oct Nov Dec
2020 102542.27 104860.36 107128.30 109349.24 111526.52
The black line and points represent the historical container throughput (TEUs) from 2011 to 2020, showing notable fluctuations with an extreme dip around 2016. The red line and points indicate the forecasted values, which follow the historical pattern and maintain a relatively stable level without sharp deviations. The shaded gray area illustrates the confidence intervals, which widen over time, reflecting growing uncertainty in longer-term forecasts.
Overall, the model suggests that throughput at the Port of Los Angeles is expected to remain close to its historical average over the next year, with regular seasonal fluctuations but no significant structural shifts. The widening forecast intervals also highlight the potential influence of external economic, trade, or policy factors that could affect future outcomes.